Calculate the standard deviation after using data cleansing techniques and replacing missing values with the mean (2 decimal places):
Cleaning the data involved replacing bad values with the average (18.35). The standard deviation was then found to be 13.51.
Show the code
# Include and execute your code hereproblem = problem.with_columns( problem=pl.col('problem').replace(["broken", "N/A", "NaN"], -999)).cast(pl.Float64)avg = problem.filter(pl.col('problem') >=0).mean().item()problem = problem.with_columns( pl.when(pl.col('problem') <0) .then(avg) .otherwise(pl.col('problem')) .alias('problem'))stdev = np.std(np.array(problem['problem']))print(f"The average used to replace the missing values: {avg:.4}\nThe standard deviation found on the data: {stdev:.4}")
The average used to replace the missing values: 18.35
The standard deviation found on the data: 13.51
Question 2
Use the pivot table, group by and/or aggregate functions to recreate the data frame of building counts for houses of 1 and 2 stories (in the rows of the table) and with a garage that fits 1,2,3 and 4 cars or less (the columns of the table) from the housing data. Display the recreated data table and display the results in a chart of your choice:
The data was recreated using group_by and aggregate functions.
> Create training and test data using train_test_split with the following arguments: test_size = .33 and random_state = 1936.
> Use GradientBoostingClassifier() to build a machine learning model
> Report your accuracy and a feature importance plot with the top 10 most important features
Model is 93% accurate at predicting the airport code based on the rest of the data.
Show the code
# Include and execute your code hereurl2 ="https://github.com/byuidatascience/data4missing/raw/master/data-raw/flights_missing/flights_missing.json"flights_json = pl.read_json("flights_missing.json")